Elementary effects for models with dimensional inputs of arbitrary type and range: Scaling and trajectory generation

The Elementary Effects method is a global sensitivity analysis approach for identifying (un)important parameters in a model. However, it has almost exclusively been used where inputs are dimensionless and take values on [0, 1]. Here, we consider models with dimensional inputs, inputs taking values on arbitrary intervals or discrete inputs. In such cases scaling effects by a function of the input range is essential for correct ranking results. We propose two alternative dimensionless sensitivity indices by normalizing the scaled mean or median of absolute effects. Testing these indices with 9 trajectory generation methods on 4 test functions (including the Penman-Monteith equation for evapotranspiration) reveals that: i) scaled elementary effects are necessary to obtain correct parameter importance rankings; ii) small step-size methods typically produce more accurate rankings; iii) it is beneficial to compute and compare both sensitivity indices; and iv) spread and discrepancy of the simulation points are poor proxies for trajectory generation method performance.


S2 Discrepancy
Following Morokoff and Caflisch [1], discrepancy of a sequence where is the number of points in J and Vol(J) is the volume of J. R N (J) gives the deviation of the sequence from complete uniformity in the sub-region J. Different kinds of discrepancies can then be obtained by restricting the sub-region J to a certain class of sets and by taking a certain norm of R N over this class [1].
The L ∞ (or sup) and L 2 discrepancy of a sequence {x (i) } N i=1 are defined as and respectively, where Vol(V ) is the volume of region V and E is the set of all sub-rectangles of [0, 1] d .Similarly, the star variants (D ⋆ ∞ and T ⋆ 2 ) are defined by restricting the sub-region J to E ⋆ , the class of sub-rectangles with a corner at 0, i.e.
In practice, it is typically not feasible to calculate the L ∞ measure (S6).Even in the case of OT, there are [r(k + 1)] k sub-regions to consider.It is therefore common to use an L 2 -based discrepancy, the main advantage  being that closed expressions are readily available (Eq. (S10)-(S13)).
Besides T 2 and T ⋆ 2 , two other commonly used L 2 discrepancies are the centered (C 2 ) and wrap-around (W 2 ) discrepancy [3,4,5].Figure S1 depicts the different kinds of sub-regions (i.e.restrictions of J) for each of the L 2based discrepancies.W 2 discrepancy is less sensitive to boundary effects by wrapping the hypercube for each dimension [5].For that reason, the wrap-around discrepancy is used in this work, since (E)OT and the 'pinned' methods naturally generate many parameter points on the boundary of the hypercube (i.e.x i = 0 or x i = 1).Indeed, from the closed expressions for T N (Eq.(S10)) and T ⋆ N (Eq.(S11)) it is clear that they are not suitable for examining uniformity of such simulation point sets; by construction there is always at least one term in each product that will vanish if the number of levels p = 4 and the optimal step size |δ| = p/(2[p − 1]) are chosen.
Typically the expected discrepancy of a uniform random sample is used as a benchmark (see [2] for a closed expression for the W 2 benchmark); if the discrepancy of the QR sequence is significantly lower than the benchmark, the sequence is deemed good; if there is no significant decrease in discrepancy, or even an increase compared to the benchmark, the sequence is deemed poor.In all cases, the discrepancies of our point sets are much larger than the benchmark.In other words, our trajectories have a worse uniform coverage than a completely random sample.This is caused by the inherent clustering in the form of trajectories (OT) or stars (radial) and, in the case of OT or radial with integer/Boolean inputs, the fact that an input can only take one of p i discrete values.One thus cannot use this benchmark to assess the quality of the set of trajectories.

S3 Sobol total sensitivity indices
Given that the factors are independent, the output variance V (Y ) for a model output Y with k scaled dimensionless input factors can be decomposed as [6]: where the first two terms are given by and the higher orders can be derived similarly [7].Here X ∼i denotes the mean is taken over all factors except X i .V i can be interpreted as the expected reduction in variance that would be obtained if X i could be fixed.The associated sensitivity coefficients for the first two orders are [6]: higher order indices are derived in a similar way.Note that these coefficients are normalized and sum to unity, i.e., i S i + i j>i S ij + . . .+ S 12...k = 1.Alternatively, the total effect index, here also referred to as Sobol total (sensitivity) index, S T i measures the total effect, i.e. first order and interactions, of input X i [6].It is given by One way of interpreting this quantity is by noting that V is the first order effect of X ∼i , so V (Y ) minus this quantity must give the contribution of all terms in the variance decomposition which do include X i (see [6,7] for more detail).The total sensitivity index S T i is linked to the EE absolute mean effect µ ⋆ i in the following way [7].µ ⋆ i is an approximation of the functional μi = ´Ω |∂f /∂x i |dx, where f is the output of interest.In [7] it is shown that S T i ≤ C μi /π 2 V , where |∂f /∂x i | ≤ C and V is the total variance of f (x).Hence, small µ ⋆ i imply small S T i .The reverse is not necessarily true1 , i.e. ranking factors based on S T i might give different results [7].Nevertheless we expect that sampling strategies that are able to accurately estimate total sensitivity indices will also be able to accurately rank parameters (based on e.g.[9]).

Analytical values Sobol indices of test functions
The analytic S T i for the K-function (Eq.( 41)) can be shown to equal that Saltelli et al. [6] plot 'total cost' on the horizontal axis, which is defined as r(k + 1) (regardless of the number of experiment replicates).To increase reproducibility, we show the number of trajectories per replicate in Figures 8-9.

Figure S1 :
Figure S1: Visual interpretation of different kinds of L 2 discrepancies.The blue-shaded area depicts the sub-region J for that discrepancy.Figure adapted from [2].